Diachronic vocabulary adaptation for b

نویسنده

  • Alexandre Allauzen
چکیده

This article investigates the use of Internet news sources to automatically adapt the vocabulary of a French and an English broadcast news transcription system. A specific method is developed to gather training, development and test corpora from selected websites, normalizing them for further use. A vectorial vocabulary adaptation algorithm is described which interpolates word frequencies estimated on adaptation corpora to directly maximize lexical coverage on a development corpus. To test the generality of this approach, experiments were carried out simultaneously in French and in English (UK) on a daily basis for the month May 2004. In both languages, the OOV rate is reduced by more than a half.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

How Diachronic Text Corpora Affect Context based Retrieval of OOV Proper Names for Audio News

Out-Of-Vocabulary (OOV) words missed by Large Vocabulary Continuous Speech Recognition (LVCSR) systems can be recovered with the help of topic and semantic context of the OOV words captured from a diachronic text corpus. In this paper we investigate how the choice of documents for the diachronic text corpora affects the retrieval of OOV Proper Names (PNs) relevant to an audio document. We first...

متن کامل

Proper name retrieval from diachronic documents for automatic speech transcription using lexical and temporal context

Proper names are usually key to understanding the information contained in a document. Our work focuses on increasing the vocabulary coverage of a speech transcription system by automatically retrieving new proper names from contemporary diachronic text documents. The idea is to use in-vocabulary proper names as an anchor to collect new linked proper names from the diachronic corpus. Our assump...

متن کامل

Representing Polysemy and Diachronic Lexico-Semantic Data on the Semantic Web ?

In this article we will outline two different vocabularies, both extensions of the lemon model, for representing diachronic lexicosemantic data on the Semantic Web. This is especially useful for representing the evolution of scientific terminologies where many terms are polysemous and or imported from other languages. The first vocabulary, polyLemon, allows for the representation of data about ...

متن کامل

Morphosemantic fields in the analysis of Croatian vocabulary

This paper presents the morphosemantic field model, claiming that it is relevant in the description of lexical structures in grammatically-motivated languages such as Croatian. Arguments are presented for the applicability of the model in synchronic and diachronic lexical analysis. The fact that many characteristics of morphosemantic fields are compatible with the theoretical framework of cogni...

متن کامل

Optimality and diachronic adaptation

In this programmatic paper, I argue that the universal constraints of Optimality Theory (OT) need to be complemented by a theory of diachronic adaptation. OT constraints are traditionally stipulated as part of Universal Grammar, but this misses the generalization that the grammatical constraints normally correspond to constraints on language use. As in biology, observed adaptive patterns in lan...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005